Conversational spontaneous speech synthesis using average voice model
نویسندگان
چکیده
This paper describes conversational spontaneous speech synthesis based on hidden Markov model (HMM). To reduce the amount of data required for model training, we utilize an average-voice-based speech synthesis framework, which has been shown to be effective for synthesizing speech with arbitrary speaker’s voice using a small amount of training data. We examine several kinds of average voice model using readingstyle speech and/or conversation-style speech. We also examine an appropriate utterance unit for conversational speech synthesis. Experimental results show that the proposed two-stage model adaptation method improves the quality of synthetic conversational speech.
منابع مشابه
Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection
Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention and spontaneity associated with everyday conversations. Unit selection is heavily data dependent and thus in order to simulate human conversational speech, or create synthetic...
متن کاملUtilising spontaneous conversational speech in HMM-based speech synthesis
Spontaneous conversational speech has many characteristics that are currently not well modelled in unit selection and HMM-based speech synthesis. But in order to build synthetic voices more suitable for interaction we need data that exhibits more conversational characteristics than the generally used read aloud sentences. In this paper we will show how carefully selected utterances from a spont...
متن کاملAccounting for Voice-Quality Variation
This paper proposes a two-layer model of the information carried in the speech signal. It attempts to define the role of prosody with a wider scope than has previously been considered in speech synthesis or linguistic research, by taking into account affective information in addition to that of linguistic content. The work is based on analysis of a large corpus of spontaneous conversational spe...
متن کاملEvaluating expressive speech synthesis from audiobooks in conversational phrases
CNGL, School of Computer Science and Informatics, University College Dublin Dublin, Ireland {eva.szekely|mohamed.abou-zleikha}@ucdconnect.ie, {joao.cabral|peter.cahill|julie.berndsen}@ucd.ie Abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice sty...
متن کاملDevelopments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual...
متن کامل